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PREFACE 


This  report  describes  work  performed  under  Contract  DACA76-88-C-0008  by  Hughes  Research 
Laboratories,  Malibu,  California  90265  for  the  U.S.  Army  Engineer  Topographic  Laboratories  (ETL), 
Fort  Belvoir,  Virginia  22060-5546  under  the  sponsorship  of  the  Advanced  Concepts  and  Technology 
(ACT)  Committee,  U.S.  Army  LABCOM,  Adelphi,  Maryland  20738-1145.  The  Contracting  Officer’s 
Technical  Representatives  at  ETL  were  Rosalene  M.  Holecheck  and  Kevin  Muliane.  The  ACT  point  of 
contact  is  Dr.  R.  Gonano. 


Yearly  Technical  Progress  Report  1990 
Contract  No.  DACA76-89-C-0002 
Cooperative  Autonomous  Agents  Testbed 


Charles  Dolan 
David  Payton 
Karel  Zikan 


1  Background  and  Scope 

The  goal  of  this  project  is  to  study  methods  by  which  multiple  autonomous 
agents  can  be  made  to  interact  effectivelv  The  primary  emphasis  is  on  mobile, 
automated  and  autonomous  agents  and  on  problem  domains  of  interest  to  the 
Army.  To  this  end,  it  is  our  intent  to  study  problem  domains  which  incorporate 
both  cooperation  and  competition  between  agents.  By  creating  a  testbed  in 
which  teams  of  autonomous  agents  can  interact,  we  will  be  able  to  study  how 
individual  agents  must  cooperate  within  a  team  in  order  to  compete,  as  a  group, 
with  an  adversary  team.  Cooperative  scenarios  allow  the  examination  of  the 
relationships  between  centralized  control  and  distributed  decision-making. 

Competition  is  the  primary  method  used  in  evaluating  various  planning  ar¬ 
chitectures  and  implementations.  By  having  teams  of  agents  compete  against 
one  another,  the  strengths  and  weaknesses  of  different  planning  methods  can  be 
compared  and  analyzed.  The  performance  of  the  agents  can  be  incrementally 
improved  in  an  evolutionary  process  in  which  the  best  features  are  extracted 
from  existing  agents  and  synthesized  into  new  agents. 

There  are  four  primary  issues  to  be  addressed  in  creating  a  useful  testbed 
for  autonomous  agents. 

1.  Find  a  problem  domain  where  human  agents  currently  work  cooperatively 
but  where  there  is  future  potential  for  autonomous  agents. 

2  Develop  a  formalism  for  expressing  planning  problems  in  the  problem  do¬ 
main. 

3.  Create  algorithms  that  generate  plans  using  the  formalism. 
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4.  Design  agents  that  can  use  the  plans  to  operate  in  a  changing  environment. 

Because  of  the  limited  scope  of  this  effort,  when  choosing  a  problem  domain, 
we  pay  close  attention  to  the  cost  of  simulating  autonomous  agents  in  the  world. 

This  report  covers  a  five  (5)  month  technical  effort.  Although  a  contract 
was  awarded  for  a  24  month  technical  effort,  by  agreement  with  the  contract 
monitor,  work  was  stopped  after  five  (5)  months  to  allow  Hughes  to  develop 
some  general  purpose  simulation  tools  under  Hughes  IR&D.  Those  simulation 
tools  have  been  completed  and  work  under  contract  is  continuing.  The  work 
using  the  new  simulation  tools  is  not  covered  by  the  period  of  this  report. 

1.1  Guide  to  the  report 

In  Section  2  we  define  a  graph  formalism  for  looking  at  multi-agent  resupply 
problems.  This  formalism  allows  us  to  use  a  great  deal  of  mathematics  from 
economics  and  optimization,  as  described  in  Section  3.  In  addition,  we  have 
found  that  graph-based  resupply  problems  can  be  better  analyzed  by  decom¬ 
posing  graphs  into  simple  circular  flows  as  described  in  Section  4.  In  section 
5  we  cover  our  discussions  with  Army  personnel  about  the  applicability  of  our 
graph  formalism  to  resupply  problems  today.  Our  future  work  is  described  in 
Section  6. 

2  Resupply  as  a  multi-agent  planning  problem 

We  began  by  defining  a  multi-agent  problem  domain  which  provides  a  suitable 
environment  for  analysis  while  also  presenting  a  level  of  complexity  which  will 
help  maintain  the  relevance  of  this  work  to  real-world  problems.  The  domain 
we  developed  combines  a  basic  resupply  problem  with  aspects  of  strategic  pre¬ 
dation.  The  types  of  competition  between  agents  and  the  degree  of  dynamics 
in  the  environment  can  be  varied  for  experimentation  with  different  types  of 
scenarios.  The  primary  advantage  of  this  is  that  we  can  vary  the  complex¬ 
ity  of  the  problem  without  making  radical  changes  to  the  environment.  Thus, 
planning  skills  developed  in  simpler  environments  can  remain  applicable  as  the 
complexity  of  the  problem  domain  is  increased. 

Figure  1  shows  a  sample  problem  in  resupply.  Note  that  natural  terrain 
obstacles,  roads,  and  safe  conduct  corridors  impose  restrictions  on  the  movement 
of  agents.  Therefore,  one  view  of  the  resupply  problem  is  to  maximize  the  flow 
from  consumers  to  producers  along  a  network  or  graph. 

Our  view  of  the  resupply  domain  is  fairly  general  and  is  intended  to  reflect 
problems  ranging  all  the  way  from  the  delivery  of  parts  on  a  factory  floor  to 
the  delivery  of  weapons  to  battle  fronts.  Simply  put,  autonomous  agents  in 
our  environment  will  attempt  to  carry  objects  from  producers  to  consumers 
by  the  most  efficient  means  possible.  Their  travel  will  be  restricted  to  limited 
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Figure  1:  Resupply  routes  are  constrained  both  by  varying  ;  raversability  of 
terrain  and  by  crossing  point  for  natural  barriers. 


pathways,  so  agents  must  coordinate  their  activities  so  as  to  avoid  interfering 
with  one  another. 

This  same  domain  may  be  applied  to  the  study  of  strategic  predation  prob¬ 
lems.  To  do  so,  we  will  allow  two  teams  to  compete  within  the  same  envi¬ 
ronment,  and  provide  each  team  with  the  ability  to  cut  off  the  supply  lines  of 
their  opponents.  Figure  2  shows  predation  as  part  of  the  resupply  problem.  By 
establishing  a  superior  level  of  supplies  at  a  battle  front  site,  a  team  may  win 
battles  and  capture  territory,  thereby  interfering  with  enemy  supply  routes.  By- 
surrounding  a  battle  site,  a  team  can  reduce  the  opponent’s  ability  to  supply 
the  battle  with  needed  resources,  thereby  eventually  winning  that  battle  and 
gaining  new  territory.  The  effort  to  surround,  however,  will  entail  the  creation 
of  new  battle  fronts,  which  will  put  the  aggressor  at  greater  risk  of  loss.  In  this 
way,  the  environment  combines  the  need  for  efficient  supply  strategies  with  the 
need  for  strategic  planning  for  predatory  operations. 

Our  bservation  of  the  strong  movements  constraints  imposed  on  resupply 
vehicles  leads  us  to  the  conclusion  that  this  domain  can  be  effectively  represented 
as  a  graph.  Figure  3  shows  a  graph  superimposed  on  the  terrain  features  from 
Figures  1  and  2.  By  casting  the  problem  domain  in  the  language  of  graphs,  we 
can  now  bring  to  bear  a  number  of  mathematical  tools.  But  is  this  formalism 
too  limited  for  studying  cooperation  among  teams  of  agents? 

We  feel  that  the  graph  view  of  resupply  has  a  very  intuitive  mapping  onto 
many  problems.  Of  course,  for  problems  (or  portions  of  problems)  for  which 
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Figure  3:  Natural  movement  constraints  and  opportunities  for  preying  on  en¬ 
emy  supply  routes  allow  us  to  represent  a  realistic  multi-agent  problem  domain 
as  a  graph  structure. 


graphs  are  not  a  good  match,  we  can  place  the  nodes  on  the  terrain  with  ar¬ 
bitrarily  fine  granularitv  connected  on  a  quad  or  hex  grid.  While  this  is  not 
recommended  for  large  problems,  saturating  a  small  area  with  nodes  can  be 
used  to  represent  small  open  areas  of  relatively  unrestricted  mobility. 

2.1  Evaluation  of  Planning  Algorithms 

The  levels  of  task  complexity  achievable  in  our  domain  are  important  because 
they  define  the  types  of  activities  that  will  be  required  of  our  autonomous  agents. 
In  domain  we  have  three  levels  of  complexity. 

1.  Preassigned  sources  and  destinations  for  each  resupply  vehicle. 

2.  Dynamic  assignment  of  sources  and  destinations. 

3.  Exchange  of  commodities  en  route  under  agent  direction. 

At  the  simplest  level,  we  have  pre-assigned  sources  and  destinations  between 
which  agents  must  travel  in  order  to  perform  their  resupply  task.  Since  agents 
are  modeled  as  trucks  that  can  transport  supplies,  even  the  simplest  environ¬ 
ment  presents  the  problem  of  optimizing  the  simultaneous  and  opposing  flows 
of  empty  and  full  trucks.  The  next  level  of  complexity  is  achieved  by  allowing 
agents  to  select  their  own  sources  and  destinations  for  pick-up  and  delivery  of 
supplies.  This  gives  agents  the  ability  to  adapt  to  changing  rates  of  production 
and  consumption.  Another  type  of  complexity  arises  when  we  allow  agents  to 
exchange  payload  at  intermediate  locations  along  their  route.  This  can  be  used 
by  a  team  to  improve  the  efficiency  of  a  delivery  route. 

In  order  to  create  competition  between  teams,  we  can  consider  reflecting  a 
network  about  the  delivery  sites,  and  install  a  team  in  each  half-network  to  feed 
the  same  delivery  sites.  We  can  now  think  of  these  delivery  sites  as  battles.  By 
comparing  the  relative  contribution  of  supplies  from  each  team  at  a  given  battle 
site,  we  can  obtain  a  measure  for  victory  or  loss  at  that  site.  Since  both  teams 
will  attempt  to  win  as  many  battles  as  possible,  we  have  a  suitable  environment 
for  competitive  evaluation. 

We  can  see  an  example  competitive  scenario  in  Figures  4-6.  We  see  in 
Figure  4  that  the  cross  hatched  team  is  devoting  more  resources  to  the  two 
conflict  nodes  in  the  large  rectangle.  In  the  resupply  domain,  this  corresponds 
to  sending  more  fuel  and  ammunition  to  a  particular  part  of  a  battle.  In  Figure 
5  we  can  see  the  graphical  depiction  of  the  cross  hatched  team  gain  control 
of  two  nodes.  Figure  6  shows  the  gray  team’s  response,  sending  more  supply 
to  the  heavily  contested  nodes.  We  can  encourage  agents  who  anticipate  the 
enemy  by  making  the  cost  of  reachieving  parity  at  a  node  larger  that  the  cost 
of  maintaining  parity. 

A  further  complication  to  this  scenario  is  to  allow  battles  to  move  as  they 
are  won  or  lost.  In  this  way,  a  team  can  encroach  upon  the  territory  of  the 
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Figure  4:  The  ownership  of  a  node  by  a  team  is  shown  by  the  different  style 
of  shading.  Nodes  in  contention  are  shaded  with  both  styles  Circular  shaded 
nodes  indicate  sources  of  supplies,  square  nodes  indicate  conflicts. 


Figure  5:  By  shifting  more  of  it  supply  activity  to  a  node,  one  team  can  take 
a  contended  node  for  itself.  Node  ownership  is  a  continuous  variable  as  shown 
by  the  change  in  the  amount  of  cross  hatching  of  the  nodes  in  the  highlighted 
area  of  the  graph. 


Figure  6:  When  one  team  changes  the  amount  of  supplies  flowing  to  a  node, 
the  other  team  is  forced  to  react,  or  yield  the  contended  node, 

other  by  winning  a  sufficient  number  of  battles  in  a  given  area.  Furthermore, 
by  pushing  a  battle  into  the  opponent’s  territory,  existing  supply  lines  may  be 
cut  off  so  that  other  ongoing  battles  are  more  easily  won.  In  this  way,  the 
resupply  problem  leads  naturally  to  strategic  predation. 

Figures  7  and  8  demonstrate  how  strategic  predation  works  in  a  graph  based 
problem.  In  Figure  7  we  see  that  conflict  occurs  along  arcs  connecting  nodes  held 
by  different  teams.  This  change  from  our  convention  in  Figures  4-6  allows  two 
important  phenomena  to  be  modeled,  (1)  concentrating  force  and  (2)  cutting 
supply  lines.  In  Figure  8  we  see  that  the  grey  team  has  advanced  one  node  at 
the  top  of  the  graph.  This  effectively  cuts-off  one  of  the  cross  hatched  nodes 
and  also  concentrates  force  on  it.  This  small  scenario  shows  how  competition 
among  teams  makes  resupply  on  a  graph  a  dynamic  competitive  environment. 

One  important  objective  of  this  contract  is  to  be  able  to  quantitatively  eval¬ 
uate  different  planning  methodologies.  The  scenarios  described  above  provide 
us  with  the  ability  to  compare  two  different  teams  of  autonomous  agents  either 
by  ranking  them  according  to  a  set  of  absolute  performance  measures  or  by 
noting  their  success  in  direct  competition.  Both  the  resupply  and  the  strategic 
predation  domains  are  amenable  to  this  kind  of  evaluation.  In  the  resupply 
domain,  it  should  be  possible  to  measure  the  overall  efficiency  of  a  team  in 
meeting  a  fixed  demand.  By  comparing  the  efficiencies  of  different  teams,  we 
will  be  able  to  determine  which  team  performs  best  in  different  situations.  The 
resupply  domain  also  affords  us  with  various  opportunities  to  subject  teams  to 
direct  competition.  In  these  scenarios,  the  actions  of  one  team  will  necessarily 
be  influenced  by  the  actions  of  the  other.  The  team  that  can  enact  the  best 
strategies  while  responding  most  readily  to  change  will  inevitably  be  the  most 
successful.  As  we  move  into  strategic  predation  scenarios,  notions  of  evaluation 
through  direct  competition  become  even  more  critical.  In  these  tests,  a  team’s 
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Figure  7:  In  a  slight  change  of  notation,  fully  shaded  circular  nodes  indicate 
supply  sources,  doughnut  shaded  circular  nodes  indicate  nodes  through  which  a 
teams  supplies  may  safely  pass  or  be  cached  and  square  nodes  indicate  conflicts. 
Note  that  in  this  figure,  conflict  occurs  along  an  arc  connecting  two  square 
nodes.  This  notation  allows  us  to  represent  “cutting  supply  lines”  in  the  graph 
notation. 


Figure  8:  By  sending  extra  supplies  to  the  top  gray  node,  the  gray  team  can 
move  forward  along  that  conflict  arc. 
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environment  will  be  almost  entirely  defined  through  interaction  with  the  op¬ 
posing  team  so  there  would  be  no  way  to  evaluate  the  team’s  performance  in 
isolation. 

2.2  Internalized  Plans  for  Multi-agent  problems 

As  more  planning  researchers  work  on  mobile  robots,  it  is  becoming  clear  that 
standard  sub-goal  based  plans  are  not  appropriate  for  agents  in  dynamic,  noisy 
environments.  The  top  portion  of  Figure  9  shows  a  standard,  sub-goal  based 
plan  for  an  autonomous  land  vehicle  (ALV)  from  (Daily  et  al.  1988).  The  ALV, 
shown  in  the  lower  right  hand  corner  of  the  figure,  is  controlled  from  a  radio 
tower  and  so  the  plan  avoids  an  RF  shadow  caused  by  a  rock  outcrop.  When 
this  plan  was  actually  executed,  the  vehicle  accidently  did  go  into  the  supposed 
RF  shadow  while  avoiding  a  collision  with  the  rock  outcrop.  The  RF  shadow 
turned  out  to  be  no  problem,  but  the  vehicle  still  headed  towards  the  subgoal 
at  the  top  of  Figure  9.  Even  though  a  much  shorter  route  would  have  been  to 
go  directly  to  the  main  goal! 

This  problem  can  be  solved  by  a  technique  called  internalized  plans  (Payton 
1988).  The  bottom  half  of  Figure  9  shows  an  internalized  plan  for  the  ALV.  The 
internalized  plan  is  a  gradient  field  showing  the  best  direction  to  move  from  any 
point  on  the  map.  The  vehicle  then  uses  the  internalized  plan  together  with 
it  sensor  input  to  navigate.  This  combination  of  internalized  plans  and  low 
level  sensor  based  control  makes  the  vehicle  much  more  flexible  and  robust.  In 
general,  internalized  plans  take  no  extra  computation,  because  all  those  states 
(grid  points  in  Figure  9)  must  be  checked  to  form  the  original  plan. 

In  the  multi-agent  resupply  domain,  static  dispatch  orders  are  the  equiv¬ 
alent  of  waypoint  based  plans  in  the  single  agent  navigation  example.  In  the 
multi-agent  case,  however,  the  drawbacks  of  static  plans  are  even  more  severe. 
When  a  set  of  dispatch  orders  are  created,  certain  assumptions  are  made  about 
who  needs  what.  Because  there  are  many,  possibly  competing,  agents  in  the 
environment,  the  situation  changes.  If  an  agent  has  no  ability  to  sense  the  en¬ 
vironment  and  make  decisions,  trips  will  be  wasted.  In  addition,  a  static  plan 
is  only  as  good  as  the  information  from  which  it  is  constructed.  A  single  agent 
has  too  limited  a  view  to  create  a  good  static  plan,  and  creating  a  coherent, 
central  view  from  distributed  inputs  is  also  problematic. 

We  need  to  consider  three  requirements  when  formulating  an  internalized 
plan  for  a  new  problem  domain, 

1.  Does  the  plan  contain  information  that  can  be  used  to  direct  action? 

2.  Can  an  agent  with  a  local  view  of  the  world  correlate  real-world  sensory 
input  to  the  plan  in  order  to  determine  its  applicability? 

3.  Can  an  agent  with  a  local  view  of  the  world  use  sensory  input  to  update 
the  internalized  plan? 
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Standard  Plan  with  waypoints  as  subgoals 


Gradient  field  as  an  internalized  plan 
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Figure  9:  By  representing  a  plan  as  a  gradient  field,  instead  of  as  a  set  of 
way  points,  an  autonomous  vehicle  has  more  flexibility  in  using  cues  from  the 
environment  to  direct  its  actions.  Without  this  flexibility,  time  can  be  wasted 
achieving  meaningless  sub-goals. 


10 


For  the  resupply  problem,  we  believe  a  graph  flow  representation  is  appro¬ 
priate.  A  graph  flow  is  simply  the  traversability  graph  with  probabilities  on 
each  arc.  Each  agent  can  carry  two  graph  representations,  one  for  its  own  tasks 
and  one  for  the  entire  team.  For  deterministic  agents,  an  agent  always  takes 
the  highest  probability  arc.  Non-deterministic  agents  choose  randomly  biased 
by  the  probabilities.  An  agent  can  watch  the  traffic  on  the  routes  it  uses  and 
see  if  it  matches  with  the  team's  plan.  If  the  traffic  does  not  match  the  plan 
then  the  agent  can  update  its  own  graph. 


3  Using  the  Mathematics  of  Economics 

We  have  taken  a  number  of  ideas  from  economics  in  designing  our  cooperative 
autonomous  agents  test  bed.  We  use  these  ideas  because  an  economic  system  is 
an  example  of  an  environment  where  an  overall  goal  is  established  by  a  central 
authority  (a  central  bank),  but  the  dynamics  of  the  system  are  implemented 
by  autonomous  agents.  In  addition,  much  of  the  mathematics  of  economics 
deals  with  the  flow  of  goods  governed  by  supply  and  demand.  Because  supply 
and  demand  are  quantities  that  an  autonomous  agent  can  sense,  they  are  good 
candidates  for  representation  in  an  internalized  plan.  Therefore  the  mathematics 
of  economics  are  useful,  even  in  domains  where  agents  are  not  truly  self  seeking, 
economically  minded  entities. 

Because  we  address  resupply  problems,  where  agents  are  carriers  that  repeat¬ 
edly  deliver  goods  to  their  “customers,”  1  it  is  natural  to  model  the  interactions 
of  autonomous  cooperating  agents  as  a  “market.”  This  model  may  be  decom¬ 
posed  into  three  related  but  separate  issues: 

1.  What  should  the  flows  be  for  efficient  delivery? 

2.  What  is  a  good  price  and  reward  changing  mechanism  to  govern  the  mar¬ 
ket? 

3.  What  is  a  good  (individualistic)  response  mechanism  for  agents  to  react 
to  changing  market  conditions? 

Issue  (I)  concerns  the  “global  planning”  (optimization)  of  the  system,  is¬ 
sues  (2)  and  (3)  concern  the  two  dual  (in  the  sense  of  mathematical  equilibria 
programming)  faces  of  “engineering”  and  “execution”  of  the  plan.  The  first 
subsection  describes  a  variant  of  simulated  annealing  to  determine  a  good  set  of 
flows  on  the  graph.  The  second  subsection  gives  two  different  price  setting  rules 
that  will  guarantee  smooth  gradual  changes  in  agent  behavior  as  the  situation 
varies  away  from  that  expected  when  setting  the  initial  flows. 

’For  instance,  they  may  deliver  materials  to  the  factory  workers,  or  they  may  deliver 
supplies  and  ammunition  to  military  units. 
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3.1  Setting  Prices 

In  order  to  formulate  a  plan  that  can  serve  as  a  resource  for  coordinating  a 
team's  activities,  the  individual  agents  are  replaced  by  a  “market,”  a  single  hy¬ 
pothetical  super-agent  aggregate  of  all  agents.  The  market  responds  to  a  set 
of  prices  and  rewards  by  returning  an  efficient  flow.  In  this  sense,  the  market 
is  just  another  function;  the  prices  and  rewards  are  the  input,  the  flows  are 
the  output.  The  response  is  “greedy”  as  it  produces  a  flow  that  optimizes  the 
market’s  rate  of  return.  We  give  a  nondeterministic  Bayesian  method,  which  is 
capable  of  returning  the  approximate  market  response  for  a  very  general  class 
of  typical  nonlinear  optimization  problems  arising  in  this  setting.  This  ap¬ 
proach  seems  to  be  justified  by  the  great  generality  of  these  problems  (general 
nonlinear  complementarity  problems)  and,  of  course,  when  warranted  by  the 
special  circumstances,  the  Bayesian  method  can  be  replaced  by  an  appropriate 
direct  method:  linear  complementarity  techniques,  convex  programming,  sep¬ 
arable  convex  or  quadratic  network  flow  programming,  or  linear  network  flow 
programming. 

Assume  that  there  are  N  available  agents,  that  can  transport  K  distinct 
commodity  types  on  a  graph.  Let  /  =  [/* ]  denote  the  vector  (matrix)  of  flows 
(units/second),  where  /,*  denotes  the  flow  of  commodity  k  —  I,  2, ....  AT  from 
node  i  to  node  j.  If  we  denote  by  /?•  the  flow  of  empty  carriers  across  the  arc 
(ij),  then  we  can  define 


K 

Q'i  ~  '  ( * ) 

k= 0 

the  total  flow  from  i  to  j. 

Let  c(<j>)  =  [cjj(<f>)]  >  0  denote  the  vector  of  time  delays  (seconds),  where 
Cij{4>)  denotes  the  time  required  to  traverse  the  arc  (ij)  from  node  i  to  node 
j.  We  assume  that  all  Cij(4>)  are  increasing  (more  precisely,  nondecreasing)  in 
each  coordinate,  however,  we  do  not  assume  separability,  convexity,  or  any  other 
additional  structure.  The  time  delay  function  c  is  the  fundamental  property  of 
the  network. 

Note  that  =  Cij(<j>)  <f>ij  is  the  total  (expected)  number  of  agents  traversing 
the  arc  (ij)  (units),  and  that  we  have  a  feasibility  of  flows  condition, 

(2) 

which  needs  to  be  added  to  the  usual  conditions  of  conservation  of  flows. 

Let  us  now  turn  to  our  Bayesian  simulated  annealing  computational  method. 
Assume  that  E(f)  is  an  “energy”  function,  which  incorporates  the  utility  that  / 
would  bring  to  the  customer,  as  well  as  a  penalty  for  violating  the  conservation 
of  flows  conditions. 
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Simulated  annealing  is  characterized  by  a  "temperature"  parameter  T  >  0. 
If  E i  is  the  energy  of  the  incumbent  solution  f1,  and  if  E 2  is  the  energy  of 
the  newly  proposed  flow  /2,  then  we  accept  f~  as  the  new  incumbent  with 
probability 


P(En  -  Ex)  -  min  jl.expf-^2  —  El  )|  .  (3) 

Note  that  /2  is  always  accepted  when  £2  <  E\. 

In  addition  to  the  energy  function  we  also  need  a  means  for  generating 
/2  that  is  biased  towards  directions  that  have  been  shown  to  be  promising. 
Therefore  we  will  divide  our  discussion  into  two  parts,  the  energy  function  and 
generating  new  flows. 

3.1.1  The  energy  function 

The  energy  function  can  be  any  function  that  is  lower  for  flows  that  better 
serve  the  customer’s  needs.  To  capture  the  idea  of  the  customer’s  needs,  we 
associate  with  each  node,  an  exchange  function  that  gives  the  relative  value  of 
each  commodity  in  terms  of  all  the  other  commodities. 

In  a  multi-commodity  market,  a  single  set  of  flows  can  accommodate  a  num¬ 
ber  of  assignments  of  goods  to  agents.  Therefore,  to  find  the  value  of  a  set  of 
flows  we  optimize  over  all  possible  assignments  of  goods  to  agents,  or  equiva¬ 
lently,  all  possible  assignments  of  goods  to  the  flows.  Of  course  we  can  only 
achieve  this  optimal  use  of  a  set  of  flows  if  agents  have  the  ability  to  exchange 
commodities  en  route. 

If  the  user’s  utility  for  each  product  at  each  location  is  approximately  linear, 
then  it  is  best  to  associate  with  each  node  t  an  exchange  matrix  6e ,  such  that 
Sfi  —  0  for  all  i  and  t.  These  matrices  are  the  aforementioned  prices  and  rewards, 
the  inputs  to  the  market  response  mechanism;  6l  specifies  the  costs  or  payoffs 
of  exchanging  the  item  k  for  item  m  at  the  node  £,  for  instance,  6k  0  represents 
the  payoff  for  selling  a  unit  of  the  good  k,  6 q  k  represents  the  cost  of  buying  a 
unit  of  the  good  k.  Impossible  or  undesirable  transactions  can  be  represented 
by  large  costs.  For  a  given  flow  /,  define  the  supply  vectors  sl  by 

4(/)  =  £/*<■  (4) 

t 

the  demand  vectors  dl  by 

dUf)  =  Y,fL  (5) 

The  usual  conservation  of  flows  is  satisfied  if  and  only  if  cTsl(f )  =  eTdl(f } 
(f  is  a  vector  of  all  ones)  at  every  node.  However,  rather  then  enforcing  the 
conservation  of  flows  (a  difficult  task  when  the  flows  are  selected  by  a  random 
mechanism),  we  allow  flows  to  violate  the  conservation  condition.  A  penalty 
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proportional  to  the  excess  flow  is  then  incurred.  Mathematically,  this  is  handled 
by  creating  a  dummy  commodity  A'  +  1.  which  can  be  either  bought  or  sold  at 
each  node  for  a  high  price. 

Let  the  residual  flows  at  the  node  l  be  denoted  by  rl(f)  =  (eTst(f)  —  erdt(f)) 
and  let 


6 


t 

<(A'  +  1) 


=  s 


t 

(A'  +  l)j 


=  Ml 


(6) 


for  all  i.j  —  0,  1, . . . .  A'.  Typically,  Ml  are  large  numbers.  At  each  node,  solve 
the  transportation  problem 


Tl(f) 


subject  to 


K  + 1  K  + 

=  min*  EEi-s 

i  =  0  j  =  0 


K  + 1 

=  s‘,  for  all 

i  =  0,1,.. 

.,  A' 

j=0 

K  + 1 

^  Xij  =  dj,  for  all 

j  =  0, 1, . 

. . ,  A' 

i  =  0 

K  + 1 

X<K+1)J  =  max{0,  r1} 

j=o 
K  + 1 

Yj  =  max{0,  —r1} 

.=o 

Xij  >  0,  for  all  i,j  =  0, 1, . . . ,  A'  +  1. 


The  solutions  of  these  problems  provide  the  locally  best  transactions  for  the 
given  set  of  flow^.  Therefore 


l 

is  the  linear  energy  (utility)  that  can  be  assigned  to  the  flows. 

3.1.2  Choosing  new  flows 

The  essential  ingredient  of  the  Bayesian  approach  is  that  the  flows  are  generated 
randomly  from  (piecewise  constant)  random  distributions  that  are  periodically 
updated.  Figure  10  illustrates  the  point  generation. 

Assume  that  g  (short  for  g^)  is  a  current  piecewise  constant  probability 
density  Bayesian  prior.  Then  sample  /*  from  this  distribution  using  uniformly 
distributed  (0, 1)  random  numbers.  The  number  first  determines  the  appropriate 
piecewise  linear  segment  and  then  it  gives  the  actual  location  within  the  segment  . 
For  instance,  if  the  random  number  is  L.  then  determine  j  such  that  < 
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a  piecewise  constant  prior  g' 


Figure  10:  Example  of  a  random  point  selection. 

L  <  This  gives  one  the  appropriate  segment.  Horizontally  project 

L  —  L  —  Y^i  =  i  a‘  to  t^e  (increasing)  diagonal  of  the  j-th  rectangle,  and  then, 
vertically  project  the  point  down  on  the  x  axis.  This  gives  you  the  sampled 
point. 

The  entire  vector  of  flows,  /,  is  so  determined,  each  coordinate  according 
to  its  own  prior  distribution.  Then,  /  is  rescaled  to  the  unit  length.  These 
are  the  locations  which  are  eventually  charged  with  success  or  failure.  Then,  a 
random  number,  n  between  0  and  N  is  chosen  according  to  its  own  prior  (we 
can  also  maximize  over  the  interval)  and  the  flows  are  scaled  (again)  so  that 
£2,.  Ftj  —  n,  (2).  2  This  flow  is  submitted  to  the  “oracle”;  if  the  “oracle" 
accepts  the  proposal,  then  mark  a  success  to  the  appropriate  constant  interval 
"bin”,  otherwise  mark  a  failure.  Finally,  after  enough  statistics  are  collected, 
use  the  Bayesian  formula  to  update  the  priors,  .  The  “oracle"  in  our  case  is 
the  energy  function  of  Section  3.1.1,  the  optimal  return  for  a  set  of  flows. 

3.2  Mechanisms  of  Price  Modification 

In  the  previous  section,  we  described  a  method  for  generating  a  resupply  plan 
based  on  the  idea  that  a  stable  set  of  flows  may  be  established,  and  that  delivery 
vehicles  should  use  this  flow  information  as  a  guide  in  deciding  how  to  reach  an 
appropriate  destination  for  the  goods  they  carry.  In  setting  up  this  kind  of  plan, 
we  made  use  of  a  market  model  in  which  individual  delivery  vehicles  could  base 
their  decisions  on  their  expected  cost  for  goods  and  travel  versus  their  expected 
payoff  upon  delivery. 

2 Choosing  n  <  iV  corresponds  to  leaving  some  agents  idle. 
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Once  we  find  a  set  of  flows  that  satisfies  our  expected  delivery  requirements, 
we  still  need  to  “enforce”  this  set  of  flows  in  the  real  market  susceptible  to 
random  shocks  and  fluctuations.  Well  suited  for  the  enforcement  is  the  well 
known  economic  principle  of  supply  and  demand.  In  the  resupply  problem,  this 
equates  to  the  reduction  in  the  payoff  at  nodes  that  are  oversupplied,  and  an 
increase  in  payoff  at  nodes  that  are  undersupplied. 

Recall  that  each  agent  starts  with  an  internalized  plan,  which  is  a  set  of 
flows  it  expects  to  see.  An  agent  also  gets  input  from  the  environment  about 
how  its  deliveries  are  actually  meeting  customer  needs.  We  characterize  this 
in  economic  terms  as  the  price  received  for  the  commodities.  For  agents  to 
respond  intelligently  to  a  changing  environment,  prices  must  change  as  various 
nodes  receive  and  consume  supplies. 

Intuitively,  such  price  manipulation  shall,  in  general,  result  in  flow  correc¬ 
tions  that  locally  improve  the  flows.  We  could  implement  such  a  policy  by  any 
ad  hoc  rule  that  decreases  the  price  as  a  node  becomes  over-supplied  and  in¬ 
creases  the  price  as  a  node  uses  its  supples  and  requires  more.  However,  it  is 
not  immediately  clear  whether,  without  some  global  guidance  for  implementing 
these  price  changes,  the  flows  will  vary  in  an  orderly  manner.  The  flow  changes 
could  just  as  easily  be  disproportionate,  unstable,  or  otherwise  problematic. 
Thus,  in  addition  to  the  qualitative  knowledge  (that  the  prices  need  to  decrease 
with  respect  to  the  supply  delivery  rates)  we  also  need  an  analytical  expression 
according  to  which  we  can  modify  the  prices  instead.  Only  then,  through  the 
market  mechanism,  can  we  truly  control  the  flows  of  supplies. 

Two  analytical  price-changing  mechanisms  are  introduced  here:  the  quadratic- 
payoff  economy,  and  the  Cobb-Douglas  economy.  Both  price-changing  mech¬ 
anisms  are  intended  for  the  general  resupply  problem  (the  general  time-delay 
functions);  however,  for  the  purpose  of  this  analysis,  we  assume  that  the  time 
delays  associated  with  arcs  are  constant.  Additional  simplifications  are  made 
for  the  sake  of  clarity  (and  tractability),  but  the  basic  principles  should  carry 
over  to  the  the  more  general  resupply  markets. 

3.2.1  Relationships  and  Contrasts  between  the  Centralized  and  Dis¬ 
tributed  Planning 

It  is  imperative  to  clearly  distinguish  between  the  various  optimization  and 
equilibria  problems  that  we  now  encounter.  Although,  from  the  abstract  point 
of  view,  the  equilibria  programming  and  mathematical  optimizations  are  essen¬ 
tially  equivalent,  their  relationships  are  often  subtle.  Let  us  analyze  the  resupply 
problem. 

It  simplifies  the  discussion  to  assume  that  there  is  only  one  type  of  commod¬ 
ity.  Because  all  feasible  flows  (circulations)  can  be  decomposed  into  simple  circu¬ 
lations  (conservation  of  flow),  let  us,  for  the  analysis,  assume  that  xi ,  X2, . . . ,  X/ 
are  all  simple  circulations;  associated  with  each  such  circulation  is  a  (decreasing) 
payoff  function  Pi(x, ),  and  a  time  delay  constant  c,.  Denote  by  x,p(x),  and  c 
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the  vectors  of  circulations,  payoffs,  and  time  delays,  respectively.  The  resupply 
market  equilibria  problem  arises  from  the  following  proposition. 

Proposition  1.  If  each  delivery  truck  maximizes  its  own  payoff  rate,  then  these 
trucks  effectively  induce  a  vector  of  simple  circulations,  x'  >  0,  such  that  for 
all  x  j  >  0 


Piun 

Ci 

>  Pdfli- 

— <  Cj  ' 

(8) 

>  0, 

(9) 

and  such  that  cT x  <  jV , 

The  equilibria  problem  is  to  find  such  an  xm .  Note  that  (8  implies  that  if 
Xi  >  0  and  x,  >  0,  then  ■  =  —Zj  1 .  All  trucks  also  must  be  used,  cTx'  =  N , 

Ci  Cj 

unless  Pi(x')  =  0  for  all  i.  If 

Pi(x)  =  J  p(s)ds,  (10) 

the  Stieltjes  integral  of  p,(x,),  then  we  can  associate  the  mathematical  pro¬ 
gram 

/ 

max^  Pj(Xj),  such  that  cTx  <  N  (11) 

T~  i  =  l 

with  the  equilibria  problem  (8).  It  follows  from  the  Lagrange  conditions  that 
the  set  of  solutions  of  (8)  contains  the  set  of  solutions  of  (10).  Moreover,  since 
p(x)  is  a  collection  of  decreasing  functions,  (10)  is  a  strictly  “convex”3mathematical 
program,  thus  a  unique  x*  solves  each  problem. 

There  is  another  optimization  problem  associated  with  the  market  which  is 
not ,  in  general,  equivalent  to  (8)  and  (10).  However,  the  new  problem  is  equiv¬ 
alent  under  certain  assumptions  and  will  be  useful  in  subsequent  derivations. 
Let  Qi(*i)  =  Xi  ■  Pi(xj).  Then  the  new  problem  is 

i 

max  ^  Qi(xi)  =  xTp(x),  such  that  cTx  <  N.  (12) 

x-°  i=i 

3  Actually,  it  is  the  equivalent  minimization  problem, 

t 

min-  }  Pi(x,),  such  that  cT x  <  N 

I>0  i—*  ~ 

1=1 

that  is  convex. 


17 


The  solutions  of  this  possibly  nonconvex  problem  gives  the  maximum  total 
payoff  rate  achievable  by  the  grand  coalition  of  all  trucks  cooperating  together.4 
Denote  a  solution  of  (12)  by  x. 

Clearly,  x  typically  is  a  suboptimal  solution  of  (10)  and  x *  typically  is  a 
suboptimal  solution  of  (12).  If  x *  is  the  desired  set  of  circulations,  then  we 
need  to  prevent  a  collusion  of  the  agents — we  either  need  to  prevent  it  by  some 
sort  of  an  “antitrust  law”  (where  the  agents  are  explicitly  prohibited  to  form 
coalitions),  or  we  need  to  prevent  it  implicitly  by  creating  a  market  such  that 
( 10)  and  ( 12)  are  solved  by  x  =  x" .  Although  it  is  not  immediately  clear  whether 
this  can  be  done,  the  latter  alternative  can  be  accomplished,  since  we  have  the 
control  of  the  market  mechanisms.  We  can,  in  fact,  create  markets  which  have 
some  additional  attractive  properties  as  well.  For  instance,  we  can  create  such 
price-changing  mechanisms  that  if  the  .  rmber  of  available  trucks  changes  from 
N  to  ctN  (a  >  0),  then  the  solution  of  the  problem  (10)  (and  (12))  changes 
proportionally  from  x'  to  ax' . 

3.2.2  Quadratic-Payoff  Economy 

A  simple  change  of  variables,  x,  »—  c, x, ,  allows  us  to  assume  that  instead  of  the 
general  time  delay  vector  c,  all  time  delays  are  the  same,  c  =  e.  Price  functionals 
of  the  quadratic-payoff  economies  are  of  the  form 

Pi(x,)  =  6i  -  Qiii,  (13) 

where  a,  and  bi  are  positive  numbers.  Therefore  the  components  of  the 
objective  function  of  the  program  (11)  are 

x? 

Pi(ii)  =  b.ii  -  a,  y,  (14) 

and  the  components  of  the  objective  function  of  the  program  (12)  are 

Qi{Xi)  =  biXi  UjXj  •  (15) 

Note  the  close  relationship  between  the  functions, 

2 

Qi(xi)=  P,(x,)-at^-.  (16) 

Now,  assume  that  g  is  the  desirable  set  of  (goal)  circulations  and  note  that, 
by  definition,  eTg  =  N.  Without  loss  of  generality,  assume  that  g  is  positive, 
because  it  suffices  to  work  with  the  support  of  g  otherwise. 

‘Some  payoff  structures  can  be  exploited  by  the  teams  of  agents  if  some  agents  are  willing  to 
sacrifice  their  own  earnings  in  order  to  improve  the  overall  earning  of  the  group.  The  earning 
of  such  agents  decreases  but  this  is  more  them  adequately  compensated  by  the  increase  in  the 
earnings  of  the  other  agents. 
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Lemma  2.  If 


and  if 


Pi(ogi)  =  Pj(agj),  for  all  i,j,  and  all  a  >  0, 


a,g,  =  ajgj,  for  all  i,  j, 
then  ag  is  the  unique  solution  of 
i 

max  )  Pi(xi),  such  that  eT x  <  qN 

r>0  ' 

~  i=  1 

and 


(17) 

(18) 

(19) 


i 

max  Qi(x, ),  such  that  eTx  <  aN  (20) 

I-°  i=i 

for  any  positive  a. 

Proof:  Since  both  mathematical  programs,  (19)  and  (20),  are  strictly  “con¬ 
vex”  (and  bounded),  they  have  unique  solutions.  If  p(x)  satisfies  (l7),  then  ag 
is  the  unique  solution  to  (19).  Then,  in  view  of  (16),  note  that  for  all  i, 


dQj(xj) 

dxi 


=  Pi(Xi)  -  atxt. 


From  (17)  and  (18)  thus  follows  that  if  xm  =  ag,  then 


(21) 


dQ.(x’)  dQ}(x *)  „  .  . 

- - — —  =  — - — — ,  for  all  i,j. 

dxi  dxj 

Verify  the  Lagrange  conditions  of  (20)  to  complete  the  proof,  a 


All  that  now  remains  is  to  notice  that  if  for  all  i  we  set  =  B ,  positive 
constant,  and  if  a*  =  -j-,  then  p(x)  satisfies  both  functional  conditions,  (17)  and 
(18).  Thus  we  have  succeeded  in  creating  a  market  of  the  desirable  type. 


3.2.3  Cobb-Douglas  Economy 

Let  all  basic  assumptions  about  the  simple  circulation  market  be  as  in  Section 
3.2.2.  Let  a  be  a  positive  vector.  In  Cobb-Douglas  economy 

t 

maxTT  x.a',  such  that  eTx  <  aN.  (23) 

r>0  -LA  ~ 

“  i'=l 

This  is  clearly  equivalent  (after  taking  a  log  of  the  objective  function)  to 
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I 


max 

z>0 

^^a,log(x,),  such  that  eTx  <  aN. 

t  —  1 

(24) 

Note  that  for  each  i 

Pi(xi)  =  a,log(x,). 

(25) 

di 

k(*‘)  =  7' 

(26) 

and 

Qi(xt)  =  a,. 

(27) 

It  follows  from  the  Lagrange  conditions  that  (24)  is  uniquely  solved  by  x"  = 
oc.  Since  all  Q,  are  constant,  any  feasible  x  trivially  solves  the  “grand  coalition 
problem"  (12).  Thus  it  suffices  to  set  a  —  g  to  achieve  the  desirable  results,  a 

3.3  Decomposing  Graphs  into  Cycles 

We  can  utilize  the  notion  of  a  cycle  for  two  different  purposes.  First,  autonomous 
agents  can  use  cycle  decompositions  of  desired  flows  as  bases  for  their  reason¬ 
ing.  On  the  basis  of  payoff  on  each  cycle,  individual  agents  choose  (and  switch 
between)  cyclical  routes  along  which  they  deliver  supplies.  We  can  also  use 
cycles  to  generate  feasible  flows,  a  technique  which  under  certain  simplifying 
assumptions  greatly  reduces  the  number  of  dimensions  of  the  space  that  needs 
to  be  sampled. 

A  vector  of  feasible  flows  in  the  resupply  problem  is  a  circulation.  That  is,  if 
o  =  is  a  vector  of  (aggregated)  feasible  flows,  then  it  can  be  decomposed 
into  a  sum  of  cycles ,  (see  Figure  11).  The  feasible  flows  then  result  from  the 
superposition  of  the  cycles.  A  cycle  is  simple  if  no  vertex  is  repeated  until  the 
cycle  completes  and  the  flow  is  the  same  on  every  arc  in  the  cycle.  All  cycles 
in  Figure  11  are  simple  and,  in  fact,  any  circulation  can  be  decomposed  into 
simple  cycles.  Note  that  in  breaking  a  flow  into  cycles,  each  arc  participates  in 
many  different  cycles. 

The  decomposition  of  a  feasible  flow  into  simple  cycles  can  be  achieved  by 
this  method.  Follow  the  given  cycle  until  a  first  repetition  of  a  vertex  occurs. 
That  completes  a  simple  cycle.  Subtract  this  cycle  from  the  flow  and  repeat 
the  procedure  until  the  decomposition  is  completed.  Figure  12  gives  an  idea 
how  this  is  done.  In  more  generality,  given  a  circulation,  remove  from  the  graph 
all  arcs  with  zero  flow.  Then  walk  randomly  (on  the  arcs  with  positive  flow) 
until  the  first  time  a  node  is  repeated.  This  determines  some  simple  cycle.  Let 
the  least  flow  on  the  arcs  of  this  cycle  become  the  cycle’s  flow.  Subtract  this 
flow  from  the  circulation  and,  again,  remove  from  the  graph  all  arcs  with  zero 
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Figure  11:  Example  of  a  decomposition  of  a  feasible  flow  into  cycles.  Numbers 
on  the  arcs  indicate  the  amount  of  flow  on  the  arc.  In  the  second  graph,  dashed 
lines  indicate  circular  flows  of  constant  amount. 


Figure  12:  A  cycle  decomposed  into  simple  cycles. 

flow.  The  remaining  flow  is  also  a  circulation.  If  the  resulting  graph  contains 
no  arcs,  then  we  are  done,  otherwise  repeat  the  procedure.  Since  at  every  stage 
at  least  one  arc  becomes  zero,  if  A  denotes  the  set  of  arcs,  then  this  algorithm 
terminates  in  at  most  0(|A|)  stages.5 

4  Knowledge  Acquisition:  Evaluating  the  Ap¬ 
propriateness  of  Our  Model 

In  this  section,  we  present  an  analysis  of  the  insights  gained  from  discussions 
with  Captain  Dennis  Szydloski  and  Captain  Gary  Krzisnik  at  Fort  Knox.  During 
two  days  of  meetings,  we  discussed  a  range  of  topics  relating  to  the  use  of 
cooperating  autonomous  agents  within  the  military.  Our  primary  emphasis  was 
on  the  development  of  suitable  scenarios  for  resupply,  but  we  also  discussed  the 
possibilities  of  developing  autonomous  tactical  vehicles  which  might  be  able  to 
cooperate  with  their  human  counterparts  on  the  battlefield. 

'The  outlined  algorithm  can  be  significantly  streamlined. 
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From  our  discussions  about  military  resupply,  we  found  that  the  overall 
problem  presents  many  interesting  and  challenging  complexities.  The  army 
categorizes  supplies  into  ten  different  classes.  Of  these  classes,  most  items  are 
dispatched  in  response  to  specific  orders  or  requests.  However,  classes  I,  III,  and 
IV,  which  include  rations,  fuel,  and  ammunition,  all  tend  to  be  allocated  on  the 
basis  of  a  combination  of  future  consumption  estimates  and  updates  of  current 
demand.  The  most  interesting  case  is  ammunition  resupply,  in  which  a  required 
supply  rate  (RSR)  is  established  at  regular  intervals  to  identify  the  amounts 
of  ammunition  needed  to  sustain  operations.  This  figure,  however,  must  be 
balanced  against  a  controlled  supply  rate  (CSR).  The  controlled  supply  rate  is 
established  on  the  basis  of  availability,  facilities,  and  transportation  for  a  given 
period.  Often,  it  is  possible  for  a  battalion’s  CSR  to  be  less  than  its  RSR.  When 
this  is  the  case,  the  overall  allocation  of  supplies  is  established  by  the  division 
commander  and  is  then  re-allocated  by  each  subordinate  commander. 

We  believe  that  because  rations,  fuel,  and  ammunition  all  have  relatively 
stable  rates  of  consumption,  the  resupply  problem  for  these  commodities  may 
be  suitably  represented  as  a  flow  problem  which  can  be  solved  by  autonomous 
agents.  Our  representation  of  this  problem,  however,  must  take  into  account  the 
dynamic  nature  of  the  battlefield  environment.  Resupply  efforts  must  function 
smoothly  in  the  presence  of  troop  degradation  and  rapid  troop  movement.  If 
autonomous  agents  are  to  assume  a  resupply  role,  they  will  need  the  ability 
to  compensate  for  the  loss  of  their  team  members  and  the  ability  to  function 
effectively  even  if  their  commanding  headquarters  is  incapacitated.  From  this 
perspective,  our  flow  representation  discussed  in  previous  reports  is  a  good 
candidate  solution. 

Despite  its  appeal  from  the  standpoint  of  adaptability,  we  found  that  our 
flow  representation  does  not  fit  easily  within  the  military  infrastructure.  In  our 
flow  model,  it  was  assumed  that  agents  could  dynamically  choose  their  paths 
to  delivery  and  supply  points  from  a  variety  of  alternate  paths  in  a  road  net¬ 
work.  Instead,  we  were  told  that  supply  routes  are  typically  fixed  and  heavily 
defended  so  that  there  is  rarely  the  desire  to  choose  alternate  paths.  In  addi¬ 
tion,  we  learned  that  supply  quantities  are  often  carefully  pre-packaged  in  the 
exact  quantities  needed  for  their  specific  destinations,  allowing  delivery  vehicles 
to  unload  and  get  out  of  threatening  areas  as  quickly  as  possible.  In  our  au¬ 
tonomous  agent  approach,  we  provide  opportunism  by  allowing  agents  flexibility 
with  regards  to  when  and  where  they  may  deliver  their  cargo.  By  doing  this, 
supplies  cannot  easily  be  pre-packaged  for  delivery  to  specific  sites. 

Another  aspect  of  resupply  which  complicates  matters  from  the  standpoint 
of  autonomous  delivery  is  that  battalions  typically  use  two  different  methods 
to  obtain  supplies.  These  are:  supply  point  distribution,  and  unit  distribution. 
In  supply  point  distribution,  a  unit  uses  its  own  vehicles  to  go  to  a  central 
supply  point  and  pick  up  supplies.  In  unit  distribution,  supplies  are  delivered 
to  units  by  transportation  assets  other  than  their  own.  Under  our  scenarios  for 
autonomous  supply  delivery,  we  assumed  a  model  which  was  purely  based  on 
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unit  distribution.  To  allow  for  both  types  of  distribution,  autonomous  agents 
may  have  to  function  cooperatively  with  the  human  agents  that  are  performing 
their  own  supply  point  distribution  tasks. 

It  may  also  be  worth  investigating  whether  the  use  of  autonomous  agents  for 
resupply  might  alter  some  of  the  above  restrictions  and  methodologies,  thereby 
allowing  our  more  dynamic  supply  model  to  be  implemented.  For  example,  if 
autonomous  agents  are  more  expendable  than  their  human  counterparts,  it  may 
be  possible  to  allow  them  to  disperse  and  take  advantage  of  a  variety  of  supply 
routes.  While  this  would  be  harder  to  defend,  it  would  be  more  reliable  since 
cutting  off  any  single  supply  route  could  not  stop  the  flow  of  supplies.  Similarly, 
the  urgency  to  get  supply  vehicles  out  of  high  risk  territory  may  be  reduced  if 
these  vehicles  are  unmanned.  If  a  delivery  vehicle  can  spend  the  time  to  deliver 
supplies  to  each  fighting  vehicle,  then  the  overall  risk  to  humans  can  be  greatly 
reduced. 

In  order  to  properly  evaluate  the  possibilities  for  doctrinal  change  afforded 
by  autonomous  delivery  agents,  it  would  be  necessary  to  have  a  highly  realistic 
simulation  of  the  resupply  problem  so  that  existing  techniques  could  be  com¬ 
pared  to  the  alternatives.  Often,  existing  doctrine  is  established  on  the  basis 
of  many  complex  constraints  that  are  not  always  apparent  to  the  uninitiated. 
Simulation  can  often  reveal  the  effects  of  these  constraints,  but  only  if  they 
are  correctly  included  within  the  simulation  model.  For  the  military  resupply 
problem,  one  of  the  most  important  and  yet  most  difficult  to  model  factors  is 
the  interaction  between  man  and  machine. 

5  Using  SIMNET  as  a  Testbed 

While  at  Fort  Knox,  we  also  investigated  the  possibility  of  using  the  SIMNET 
simulation  environment  as  a  testbed  for  our  multi-agent  research.  SIMNET 
provides  a  unique  environment  where  military  teams  up  to  the  size  of  a  battalion 
can  train  for  battle  without  using  real  military  vehicles.  The  simulation  creates 
sufficiently  realistic  sensory  input  for  human  trainees  that  they  can  learn  a  great 
deal  about  how  to  function  as  an  effective  team  as  they  engage  in  simulated 
battle  scenarios.  We  explored  the  possibilities  of  either  integrating  a  team  of 
resupply  agents  or  individual  fighting  agents  into  the  SIMNET  environment. 

SIMNET  is  structured  as  a  highly  distributed  network  of  simulators.  Each 
simulator  is  manned  by  a  human  crew,  and  the  actions  of  the  crew  are  transmit¬ 
ted  through  the  network  so  that  the  world  representation  is  consistent  between 
all  simulators.  The  visual  displays  presented  to  each  tank  crew  are  simple,  but 
sufficient  to  provide  a  good  sense  of  terrain  and  other  vehicles.  Because  of  its 
highly  distributed  nature,  it  is  possible  to  insert  a  completely  automated  player 
into  the  SIMNET  network.  This  is  currently  being  done  by  BBN  for  their 
simulation  of  semi- automated  opposing  forces. 

In  exploring  the  possibility  of  automating  the  resupply  task  within  SIMNET, 
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we  found  that  currently,  supply  trucks  are  simply  made  to  appear  at  destinations 
designated  by  a  supply  commander.  To  be  able  to  study  interesting  supply 
problems,  the  simulation  would  need  to  support  the  operation  of  supply  vehicles 
through  all  phases  of  their  delivery  task.  Another  problem  with  simulating 
supply  is  the  overall  duration  of  a  SIMNET  training  session.  Rarely  does  a 
training  session  last  long  enough  for  resupply  to  become  an  important  issue  in 
the  training  exercise. 

SIMNET  would  be  especially  useful  in  the  study  of  the  incorporation  of 
autonomous  agents  into  a  platoon.  SIMNET  provides  the  opportunity  to  study 
the  substitution  of  an  autonomous  system  for  a  tank  crew.  The  issues  of  interest 
would  revolve  around  what  capabilities  would  be  needed  from  such  a  system  to 
allow  the  human  platoon  members  to  fight  effectively.  The  autonomous  system 
would  clearly  need  to  be  able  to  interpret  and  cooperate  with  the  platoon  leader. 

It  would  also  need  to  perform  many  sensing  and  decision-making  tasks  on  its 
own.  By  using  SIMNET,  it  may  be  possible  for  military  planners  to  explore 
different  roles  of  autonomous  systems  before  committing  to  large  expenditures 
on  hardware. 

6  Conclusions 

We  feel  that  the  approach  developed  during  the  first  year  is  technically  sound. 

It  provides  a  framework  for  creating  and  executing  flexible  resupply  plans  in  a 
changing  environment.  However,  after  our  knowledge  acquisition  at  Fort  Knox, 
we  found  that  our  initial  formulation  was  simple  compared  with  the  complexi¬ 
ties  of  real  resupply  problems.  To  demonstrate  the  feasibility  of  this  approach 
would  require  substantial  knowledge  acquisition  to  create  the  mathematical  for¬ 
malisms,  and  it  would  require  extensive  coding  and  testing  of  algorithms.  We 
view  this  as  a  high  risk  endeavor. 

A  better  application  of  this  technology  would  be  the  Semi-Automated  Forces 
(SAFOR)  in  SIMNET.  Such  a  shift  in  emphasis  servers  two  purposes.  First  we 
can  develop  multi-agent  planning  algorithms  that  can  be  used  in  the  long  term 
for  autonomous  and  semi-autonomous  combat  vehicles.  Second,  in  the  short 
term,  we  can  use  our  algorithms  to  decrease  the  number  of  humans  needed  for 
opposing  forces  in  a  SIMNET  exercise. 
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